Skip to content

fix: harden reassociation barriers for fast-math#1276

Open
DiamonDinoia wants to merge 1 commit intoxtensor-stack:masterfrom
DiamonDinoia:fix/nearbyint-fastmath
Open

fix: harden reassociation barriers for fast-math#1276
DiamonDinoia wants to merge 1 commit intoxtensor-stack:masterfrom
DiamonDinoia:fix/nearbyint-fastmath

Conversation

@DiamonDinoia
Copy link
Contributor

There is a but in nearbyint -fassociative-math breaks it as it does not define __FAST_MATH__.
Also the barrier used was causing a stack spill.

I centralized a barrier function that we can use everywhere in the code and used it in the places I know it helps.

Now we can just use reassociation_barrier to avoid compiler reordering of instructions.

Let me know if you like the internal API and if you need changes to it. I find that this is the solution that minimizes ifdef boilerplate and allows to dispatch to all archs. (With c++17 this will be simpler).

Cheers,
Marco

@DiamonDinoia DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 4 times, most recently from 91b4081 to f9a9992 Compare March 23, 2026 19:16
@DiamonDinoia DiamonDinoia force-pushed the fix/nearbyint-fastmath branch from f9a9992 to 2f9a431 Compare March 23, 2026 19:55
@DiamonDinoia DiamonDinoia changed the title fix: harden reassociation barriers for fast-math nearbyint fix: harden reassociation barriers for fast-math Mar 23, 2026
@DiamonDinoia
Copy link
Contributor Author

@serge-sans-paille this is also ready for review.

cd _build && cmake --build .
- name: Testing xsimd
run: |
cd _build && ./test/test_xsimd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to split that part in a separate commit (even PR!). It's independent from the fast-math issue, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we can, though the fast math part tough will fail without this PR.

cd _build && cmake .. -DBUILD_TESTS=ON -DDOWNLOAD_DOCTEST=ON -DBUILD_BENCHMARK=ON -DBUILD_EXAMPLES=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_CXX_FLAGS="/arch:AVX2" -G Ninja
- name: Build
run: |
cd _build && cmake --build .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could be cmake --build -B _build

- name: Setup Ninja
run: |
python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install ninja
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the redundancy between that job and the one above, it seems better to use github strategy.

XSIMD_INLINE void reassociation_barrier(T& x, memory_barrier_tag) noexcept;

template <class T, class A>
XSIMD_INLINE void reassociation_barrier(T& x, A const&) noexcept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this dispatching based on the architecture?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the usage, I wonder if we should enforce a string literal as last argument, that explains the reason why we generate that barrier. the string would be ignored, but it enforces the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because arm/neon have different asm instructions or might not be supported at all. I just thought this would be a way to avoid ifdefs. Potentially reassociation_barrier can be guarded by xsimd_with_arch and we get a different function with different ifdefs. I have not investigated where to put such function (in which header) and how the code looks like. This felt xsimd idiomatic.

I am okay with a [[maybe_unused]] string literal argument to this function for documentation purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants